Laboratory of Neurophenomics | Mood Biomarkers FAQ

Biomarker Discovery Efforts- The "Secrets" of our Success FAQ

Why blood?

Accessibility, as the brain cannot be biopsied; less painful and easier to collect than CSF.

Why gene expression and not proteomics?

Gene expression reflects gene- environment interactions and underlies biological function. For discovery, it is far more powerful and precise than proteomics, as it lends itself to whole-genome arrays or sequencing, and Convergent Functional Genomics analytics. Moreover, some RNAs are just regulatory, do not translate into proteins, and we would miss them with proteomics. After gene expression biomarkers are discovered and validated, their protein counterpart levels can be assessed and tested in a hypothesis-driven fashion for clinical use, if one desires.

Why whole blood?

Ease of collection. Minimal manipulation of blood by not seperating different cell types, so good RNA preservation. Ease of deployment in the field for population studies.

Why not immortalized lymphoblastoid cell lines?

Because they can be riddled with Ebstein-Barr virus (EBV) induced artifacts, cell culture passaging artifacts, and have not direct correlation with measured clinical state.

Does it matter from which cell type the biomarkers come? What about CBC variations?

On the front end (discovery) it does not matter from which blood cells the biomarkers come, as we are not interested in blood cell biology, we just want to find peripheral markers. On the back-end, for clinical implementation, after a biomarker is discovered and validated ,larger population studies need to be conducted to find normative levels for laboratory testing. At that point, understanding which cell types the biomarkers come from should be done, as variations in CBC (infections, immune disorders) could influence biomarker levels.

Why would changes in blood gene expression reflect changes in the brain?

Likely due to conserved signal transduction modules in different cell types, similar promoter regions, common exposure to environmental factors like stress, medications. However, the signal is buried in noise, so you need a good design and methodology for extracting it.

Why are we successful with relatively small cohorts?

1. Phenotype
We prefer to study discrete quantitative phenotypes, not broad diagnostic categories.
We prefer to focus on state markers, correlated with phenes measured at the time of blood draw, not trait markers.
We study these phenotypes (mood, hallucinations, suicidality) in high –risk populations (bipolar, schizophrenia), which provide an enriched pool.
When selecting our subjects, for reliability of phene measure we look for intra-subject designs and/or for the convergence of internal feelings and thoughts ( as measured by self-report scales) and of external actions and behaviors ( as measured by external raters).

2. Cohorts
Separated by gender, separated by ethnicity- homogeneity, reduction in noise.

3. Gene expression is much more powerful than genetics
One expressed gene may integrate the effects of n~10³ SNPS, as well as the effects of the environment.

4. Study design
An intra-subject design is the best, as it factors out genetic variability. We have used that in our most recent studies. You may need a cohort size of n~10¹ for gene expression studies, and n~10³ for family based genetic studies (the closest you can come to “intra-subject” in genetics).

A case-case design is second best, as it factors out some disease related variability. We have used that in some of our past studies. You may need n~10² for gene expression studies, and n~10⁵for genetic studies.

A case-control design is the least good, due to populational heterogeneity and noise that is not factored out, i.e. many of the changes do not have something to do directly with the phenotype you are studying. We have NOT used that. You may need n~10³ for gene expression studies, and n~10⁶ for genetic studies.

5. Convergent Functional Genomics (CFG)
Is a magnet that finds the biomarker “needle” in the blood gene expression “haystack”. Uses in a Bayesian way the whole body of work in the field to identify and prioritize disease-related genes from the long lists of differentially expressed genes in the blood. As a bonus, the biomarkers prioritized are fit-to-disease, not fit to cohort. Because of that, they travel well and reproduce in independent cohorts, which is the ultimate litmus test we subject our findings to before we publish them.